Scalable greedy algorithms for transfer learning
نویسندگان
چکیده
In this paper we consider the binary transfer learning problem, focusing on how to select and combine sources from a large pool to yield a good performance on a target task. Constraining our scenario to real world, we do not assume the direct access to the source data, but rather we employ the source hypotheses trained from them. We propose an efficient algorithm that selects relevant source hypotheses and feature dimensions simultaneously, building on the literature on the best subset selection problem. Our algorithm achieves state-of-the-art results on three computer vision datasets, substantially outperforming both transfer learning and popular feature selection baselines in a small-sample setting. We also present a randomized variant that achieves the same results with a fraction of the computational cost. Also, we theoretically prove that, under reasonable assumptions on the source hypotheses, our algorithm can learn effectively from few examples.
منابع مشابه
Scalable Greedy Feature Selection via Weak Submodularity
Greedy algorithms are widely used for problems in machine learning such as feature selection and set function optimization. Unfortunately, for large datasets, the running time of even greedy algorithms can be quite high. This is because for each greedy step we need to refit a model or calculate a function using the previously selected choices and the new candidate. Two algorithms that are faste...
متن کاملA Greedy Divide-and-Conquer Approach to Optimizing Large Manufacturing Systems using Reinforcement Learning
Manufacturing is a challenging real-world domain for studying hierarchical MDP-based optimization algorithms. We have recently obtained very promising results using a hierarchical reinforcement learning based optimization algorithm for a 12-machine transfer line. Transfer lines model factory processes in automobile and many other product assembly plants. Unlike domains such as elevator scheduli...
متن کاملScalable Inference on Kingman's Coalescent using Pair Similarity
We present a scalable sequential Monte Carlo algorithm and its greedy counterpart for models based on Kingman’s coalescent. We utilize fast nearest neighbor algorithms to limit expensive computations to only a subset of data point pairs. For a dataset size of n, the resulting algorithm has O(n log n) computational complexity. We empirically verify that we achieve a large speedup in computation....
متن کاملHorizontally Scalable Submodular Maximization
A variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity – number of instances that can fit in memory – must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physic...
متن کاملSupervised Clustering: Algorithms and Application
This work centers on a novel data mining technique we term supervised clustering. Unlike traditional clustering, supervised clustering assumes that the examples are classified and has the goal of identifying class-uniform clusters that have high probability densities. Three representative–based algorithms for supervised clustering are introduced: two greedy algorithms SRIDHCR and SPAM, and an e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Vision and Image Understanding
دوره 156 شماره
صفحات -
تاریخ انتشار 2017